| Nanjiang Shu | Can Hou | Yike Gong | Lei Liu | Yihan Hu |
|---|---|---|---|---|
Introduce yourself!
With programming, you can automate some manual procesures
# In Python
print("Hi, Python!")
/* In C++ */
#include <iostream>
int main() {
std::cout << "Hi, Python!" << std::endl;
return 0;
}
pandas to read, manipulate and write Excel files programmtically| Old versions | Python 3 |
|---|---|
| Python 1.0 - January 1994 | Python 3.0 - December 3, 2008 |
| Python 1.0 - January 1994 | Python 3.1 - June 27, 2009 |
| Python 1.2 - April 10, 1995 | Python 3.2 - February 20, 2011 |
| Python 1.3 - October 12, 1995 | Python 3.3 - September 29, 2012 |
| Python 1.4 - October 25, 1996 | Python 3.4 - March 16, 2014 |
| Python 1.5 - December 31, 1997 | Python 3.5 - September 13, 2015 |
| Python 1.6 - September 5, 2000 | Python 3.6 - December 23, 2016 |
| Python 2.0 - October 16, 2000 | Python 3.7 - June 27, 2018 |
| Python 2.1 - April 17, 2001 | Python 3.8 - October 14, 2019 |
| Python 2.2 - December 21, 2001 | Python 3.9 - October 5, 2020 |
| Python 2.3 - July 29, 2003 | Python 3.10 - October 4, 2021 |
| Python 2.4 - November 30, 2004 | Python 3.11 - October 24, 2022 |
| Python 2.5 - September 19, 2006 | Python 3.12 - October 2, 2023 |
| Python 2.6 - October 1, 2008 | |
| Python 2.7 - July 3, 2010 |
At the end of the course, you should be able to:
print(1 + 1)
print("Hello Python")
2 Hello Python
a = 1 + 1
print(a)
a = "Hello Python"
print(a)
2 Hello Python
1 and Hello Python, in the Python code are called liternalsa that holds the value is called a variable42, -5, 1000 3.14, -0.002, 10.0Hello Python, ATCG True and False.Nonea = 1
a = "ATCG"
a = True
a = None
# Can you tell the types of them?
sequence_length = 200
scale = 2.5
gene_id = "ABC12345"
is_DNA = False
type() function to determine the type of a variable¶
print() function to display the value of a variable¶sequence_length = 200
print(sequence_length)
seq_len = 200
seq_lens = [100, 150, 200] # a list
print(seq_lens[1])
seq_lens = (100, 150, 200) # a tuple
print(seq_lens[1])
li = [100, 150, None, "ATCG", 3.1415, seq_len, seq_lens]
li
li_seqlens = [100, 150, 200]
tu_seqlens = (100, 150, 200)
li_seqlens[1] = 500
print(li_seqlens)
tu_seqlens[1] = 500
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) Cell In[94], line 1 ----> 1 tu_seqlens[1] = 500 TypeError: 'tuple' object does not support item assignment
gene_ids = {"TP53", "COX2", "EGFR", "MTOR"} # a set
gene_ids
{'COX2', 'EGFR', 'MTOR', 'TP53'}
# set is unordered
gene_ids = {"1", "2", "3", "4", "5"}
for e in seq_lens:
print(e)
# set has unique element
gene_ids = {"1", "1", "2", "2", "3"}
print(gene_ids)
sequence_info = { # a dictionary
"gene": "TP53",
"species": "Homo sapiens",
"length": 2000
}
That depends on their type
8/2
8
1 + 1.5
float
result = 0.1 + 0.2 - 0.3
print(result)
print(result == 0.0)
print((result - 0.0) < 1e-6)
"protein"[1:4]
"a" == "b"
1 + 1.5
# Guess what will be the result for this?
1 + True
2
dkfsjdsklut¶global = 5
Cell In[101], line 1 global = 5 ^ SyntaxError: invalid syntax
Use descriptive names:
gene_id instead of gi.Follow naming conventions for Python, e.g., snake_case:
gene_name, sequence_length, sample_id.Prefix Booleans with 'is', 'has', 'can', etc:
is_high_quality, has_mutations.# display the value with print()
result = "ACCCG" * 5
print(result)
ACCCGACCCGACCCGACCCGACCCG
# show the type of value with type()
print(type(result))
<class 'str'>
# convert float value to string value with str()
str(2.5)
'2.5'
sequence = "ATGCTACGATaCG"
len(sequence)
seq_lens = [100, 200, 300]
len(seq_lens)
# can you get the length of an integer
len(3)
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) Cell In[108], line 2 1 # can you get the length of an integer ----> 2 len(3) TypeError: object of type 'int' has no len()
read_counts = [1500, 2000, 1750, 2250, 1900, 2500]
print("total_reads:", sum(read_counts))
total_reads: 11900
expression_levels = [2.5, 3.6, 4.2, 5.0, 3.8, 3.8, 9.5, 100.1]
print("Max expression level:", max(expression_levels))
print("Min expression level:", min(expression_levels))
Max expression level: 100.1 Min expression level: 2.5
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) Cell In[118], line 4 2 print("Max expression level:", max(expression_levels)) 3 print("Min expression level:", min(expression_levels)) ----> 4 min(3) TypeError: 'int' object is not iterable
print("Average expression level: ", sum(expression_levels)/len(expression_levels))
read_counts = [1500, 2000, 1750, 2250, 1900, 2500]
sorted_read_counts = sorted(read_counts)
print(sorted_read_counts)
[1500, 1750, 1900, 2000, 2250, 2500]
seq_len1 = 150
seq_len2 = 181
seq_len1 <= seq_len2
True
freq1 = 0.51
freq2 = 1.5
freq1 > 0.5 and freq2 > 0.5
True
gene_ids = ["TP53", "COX2", "EGFR", "MTOR"] # a set
"TP53" in gene_ids
length = 500
species = "Mouse"
read_count = 100
# I want to evaluate the condition that length is larger than 300 or species is Mouse,
# and read_count is larger than 200
# Expected value: False
length > 300 or species == "Mouse" and read_count > 200
True
(length > 300 or species == "Mouse") and read_count > 200
() to make your intended grouping explicit and improve readability.Lists and strings are an ORDERED collection of elements where every element can be accessed through an index.
mylist = [1, 2, 3, 4, 5, 6, 7, 8, 9]
mylist[2]
mylist[1:3]
mylist[0:9:2] # [start, stop, step]
mylist[3:] # from 4th position to the end
mylist[:5] # from the beginning to the 5th position
mylist[:] # the same as mylist[::], mylist[::1]
list = [1, 2, 3 , 4 , 5]
positive indeces 0 1 2 3 4
negative indeces -5 -4 -3 -2 -1
Equation: positive_index = len(li) + negative_index
mylist[-1] # return the last element, equivalent to mylist[8]
# What will be the result for this?
mylist[::-1]
mystr = "123456789"
print("mystr[2] \t= ", mystr[2] )
print("mystr[1:3] \t= ", mystr[1:3])
print("mystr[0:9:2] \t= ", mystr[0:9:2])
print("mystr[3:] \t= ", mystr[3:])
print("mystr[:5] \t= ", mystr[:5])
print("mystr[:] \t= ", mystr[:])
print("mystr[-1] \t= ", mystr[-1])
print("mystr[::-1] \t= ", mystr[::-1])
Mutable objects can be altered after creation, while immutable objects can't.
| Immutable objects | Mutable objects |
|---|---|
| int | list |
| float | set |
| bool | dict |
| str | |
| tuple |
# list is mutable
mylist = [1, 2, 3, 4, 5, 6, 7, 8, 9]
print(mylist)
mylist[2] = 7
print(mylist)
# string is immutable
mystr = "123456789"
mystr[2] = "7"
int, float and bool immutable?¶a = 5
print("a=", a)
a = 6
print("a=", a)
# id(var) returns the memory address of var.
a = 5
print("memory address of a = ", id(a), ", value of a = ", a)
a = 6
print("memory address of a = ", id(a), ", value of a = ", a)
mylist = [1, 2, 3, 4, 5, 6, 7, 8, 9]
print("memory address of mylist = ", id(mylist), ", value of mylist = ", mylist)
mylist[2] = 7
print("memory address of mylist = ", id(mylist), ", value of mylist = ", mylist)
memory address of mylist = 4469238464 , value of mylist = [1, 2, 3, 4, 5, 6, 7, 8, 9] memory address of mylist = 4469238464 , value of mylist = [1, 2, 7, 4, 5, 6, 7, 8, 9]
mylist = [1, 2, 3, 4, 5, 6, 7, 8, 9]
mylist.append(15)
print(mylist)
[1, 2, 3, 4, 5, 6, 7, 8, 9, 15]
mylist.remove(15)
print(mylist)
[1, 2, 3, 4, 5, 6, 7, 8, 9]
del mylist[2]
print(mylist)
[1, 2, 5, 6, 7, 8, 9]
() for clarity!gene_ids = ["TP53", "COX2", "EGFR", "MTOR"]
# How do we print all IDs, one per line?
print(gene_ids[0])
print(gene_ids[1])
print(gene_ids[2])
print(gene_ids[3])
gene_ids = ["TP53", "COX2", "EGFR", "MTOR"]
for gene_id in gene_ids:
print(gene_id)
Note the INDENT of the for loop
for gene_id in gene_ids:
print(gene_id)
for gene_id in gene_ids:
print("==ID==")
print(gene_id)
for gene_id in gene_ids:
print("==ID==")
print(gene_id)
for gene_id in gene_ids:
print("==ID==")
print(gene_id)
For loop¶gene_ids = ["TP53", "COX2", "EGFR", "MTOR"]
for gene_id in gene_ids:
print(gene_id)
While loop¶gene_ids = ["TP53", "COX2", "EGFR", "MTOR"]
i = 0
while i < len(gene_ids):
print(gene_ids[i])
i += 1
print("== When loop ends, i =", i)
For loop
Is a control flow statement that performs a fixed operation over a known amount of steps.
While loop
Is a control flow statement that allows code to be executed repeatedly based on a given Boolean condition.
Which one to use?
For loops better for simple iterations over lists and other iterable objects
While loops are more flexible and can iterate an unspecified number of times
while loop¶gene_ids = ["TP53", "COX2", "EGFR", "MTOR"]
i = 0
while i < len(gene_ids) and not gene_ids[i].startswith("E"):
print(gene_ids[i])
i += 1
print("== When loop ends, i =", i)
while True:
print("yes")
Note: there is one built-in function called range() which is especially useful for the for loop
range() function¶for i in range(10):
print(i)
file_basename = "dnaseq"
for i in range(10):
seqfile = file_basename + "_" + str(i) + ".fa"
print("Analyzing " + seqfile)
for loops and while loops.for loops are better suited for iterating over lists and other iterable objects when the number of iterations is known or finite.while loops offer more flexibility and can iterate an unspecified number of times, as they continue until a specified condition is no longer true.range() function is useful for programming with loops. if, elif, and else for conditionals.if condition1:
# executed if condition1 is True
elif condition2:
# executed if condition1 is False and condition2 is True
else:
# executed if both condition1 and condition2 are False
if statement¶if statement is the fundamental control statement that allows Python to execute based on a condition.dna_sequence = "AGTCTCG"
if 'N' not in dna_sequence:
print("Valid DNA sequence.")
else statement¶else statement follows an if and defines what to do if the if condition is not met.expression_level = 35
if expression_level > 50:
print("Gene is overexpressed.")
else:
print("Gene is not overexpressed.")
elif statement¶elif (shortname for "else if") allows chaining of conditional statements.expression_level = 35
if expression_level > 100:
print("Gene is overexpressed.")
elif expression_level > 30 and expression_level <= 100:
print("Gene is expressed.")
else:
print("Gene is underexpressed.")
# Use nested conditionals to categorize genetic variants based on multiple attributes.
genotype = "AG"
phenotype = "expressed"
if genotype == "AG":
# Only check phenotype if genotype is "AG"
if phenotype == "expressed":
print("Variant " + genotype + " is active and expressed.")
else:
print("Variant " + genotype + " is active but not expressed.")
else:
print("Variant " + genotype + " is a non-target variant.")
if, elif, and else are powerful tools for controlling program logic.if/elif/else statements can be nested. However, it is advisable to avoid excessive nesting, as this can make the code difficult to read and maintain.